allowing thousand separators + fix the consumed indicator by lemire · Pull Request #111 · CarlVerret/csFastFloat

lemire · 2026-04-27T15:18:24Z

This addresses two issues. The first is a small bug in the way we report characters_consumed from TryParseNumber: in the UTF-16 path of FastFloatParser and the UTF-8 path of FastDoubleParser, we were advancing the start pointer past leading whitespace before parsing the number, then setting characters_consumed = pns.characters_consumed — which is measured from the new start, so the leading whitespace was silently dropped from the count. The two sibling paths already tracked a leading_spaces counter and did the addition; this PR brings the other two in line. While I was there I also turned the ThrowArgumentException() on empty input in FastFloatParser.TryParseNumber(byte*, ...) into a return false, since Try* should not throw.

The second change is motivated by issue #110, where FastFloatParser.ParseFloat("1,234,567.89") returns 1 while float.Parse("1,234,567.89", CultureInfo.InvariantCulture) returns 1234567.9. The reason is that the BCL’s Parse(string, IFormatProvider) overload defaults to NumberStyles.Float | NumberStyles.AllowThousands, and we did not honour AllowThousands at all. To make matching that behaviour possible without changing our defaults, this PR adds an optional thousands_separator parameter (default ',' for the char overloads and (byte)',' for the byte overloads) on every public Parse* / TryParse* entry point. When NumberStyles.AllowThousands is in styles and the thousands separator differs from the decimal separator, ParsedNumberString will accept a separator that lies strictly between two digits in the
integer part of the number; otherwise the previous behaviour is preserved. The European convention (thousands_separator: '.', decimal_separator: ',') works as you would expect. Default behaviour is unchanged: without AllowThousands, parsing "1,234" still stops at the comma and consumes one character.

The change to ParsedNumberString.ParseNumberString was deliberately small. I switched the integer-part loop from raw pointer arithmetic to a counter so that the reported digit count is correct in the presence of separators, and I extended the rare reparse path used when there are more than nineteen digits so that it skips separators while computing the truncated mantissa and exponent. The fractional, exponent, sign, and SIMD code paths are untouched.

Tests cover both fixes. There are new theories for characters_consumed correctness across the string, char*, and byte* overloads of both parsers, including a "walk through the buffer" test that would have caught the original bug. There are also tests for the new parameter: BCL-parity for typical inputs like 1,234,567.89, the European convention, no-op behaviour when AllowThousands is absent, the byte* path, and rejection of a leading separator. The 24-digit case (1,234,567,890,123,456,789,012,345) exercises the long-mantissa reparse path.

A short benchmark comparison (BenchmarkDotNet ShortRunJob, three iterations on canada/mesh/synthetic) shows no regression.

Note that this does not change the default styles for ParseFloat(string) to include AllowThousands — doing so would change the meaning of inputs that today parse to a partial result, which seemed too aggressive to do in the same change. The PR #110 test case will succeed once the caller opts in by passing NumberStyles.Float | NumberStyles.AllowThousands.

lemire · 2026-04-27T15:19:09Z

@nietras Would that work?

nietras · 2026-04-29T12:15:35Z

Would that work?

@lemire I would think so, will try to test in Sep soon hopefully. Thanks :)

allowing thousand separators + fix the consumed indicator

288a8d3

lemire requested a review from CarlVerret April 27, 2026 15:18

update

ec38178

CarlVerret merged commit 08263c1 into master Apr 27, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

allowing thousand separators + fix the consumed indicator#111

allowing thousand separators + fix the consumed indicator#111
CarlVerret merged 2 commits intomasterfrom
lemire/allowthousands

lemire commented Apr 27, 2026

Uh oh!

lemire commented Apr 27, 2026

Uh oh!

Uh oh!

nietras commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lemire commented Apr 27, 2026

Uh oh!

lemire commented Apr 27, 2026

Uh oh!

Uh oh!

nietras commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants